Add session shutdown logic to controller when drained#2573
Conversation
54c9c78 to
9779824
Compare
There was a problem hiding this comment.
Pull request overview
This PR implements device maintenance workflow by adding session shutdown logic when a device status is set to "Drained". It introduces a new DeviceStatusDrained state and modifies the controller template to automatically shut down BGP sessions, MSDP neighbors, and ISIS routing when a device enters this maintenance state.
- Added
DeviceStatusDrainedstatus constant to the device status enumeration - Modified
tunnel.tmplto conditionally shutdown routing protocols (BGP, MSDP, ISIS) when device status is "drained" - Added comprehensive test coverage with fixture file to verify drained device configuration
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| smartcontract/sdk/go/serviceability/state.go | Adds new DeviceStatusDrained constant and string representation |
| controlplane/controller/internal/controller/templates/tunnel.tmpl | Implements conditional shutdown logic for routing protocols based on device status |
| controlplane/controller/internal/controller/render_test.go | Adds test case for drained device configuration rendering |
| controlplane/controller/internal/controller/fixtures/base.config.drained.txt | Provides expected output fixture for drained device test |
| CHANGELOG.md | Documents the new feature |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
controlplane/controller/internal/controller/templates/tunnel.tmpl
Outdated
Show resolved
Hide resolved
controlplane/controller/internal/controller/templates/tunnel.tmpl
Outdated
Show resolved
Hide resolved
controlplane/controller/internal/controller/templates/tunnel.tmpl
Outdated
Show resolved
Hide resolved
controlplane/controller/internal/controller/templates/tunnel.tmpl
Outdated
Show resolved
Hide resolved
controlplane/controller/internal/controller/templates/tunnel.tmpl
Outdated
Show resolved
Hide resolved
controlplane/controller/internal/controller/templates/tunnel.tmpl
Outdated
Show resolved
Hide resolved
059db68 to
1a54fec
Compare
|
@martinsander00 - what happens when a device becomes active again? From what I can tell from the code, there is no logic to run |
I believe the "no router msdp" should take care of this for msdp, but the same issue exists for the isis overload bit, we need to add "no router isis" to handle that. Also it would be good to un-drain the device in the e2e test to prove it works. |
controlplane/controller/internal/controller/templates/tunnel.tmpl
Outdated
Show resolved
Hide resolved
@martinsander00 note that we can't do the same 'no router bgp` as we don't fully own the configuration for BGP i.e. the contributor often adds their own configuration. |
8276e61 to
67e27f9
Compare
controlplane/controller/internal/controller/templates/tunnel.tmpl
Outdated
Show resolved
Hide resolved
7f71ef7 to
acdbf69
Compare
76f6ed5 to
b3f6b71
Compare
for BGP the template does this BEFORE the shutdown check: for msdp it does: This means neighbors/peers are recreated fresh each time right? So when un-draining, the neighbor is deleted and recreated without shutdown, effectively un-shutting it. |
36c6559 to
70d481b
Compare
70d481b to
35dbe28
Compare
0707ff0 to
ea030c1
Compare
controlplane/controller/internal/controller/templates/tunnel.tmpl
Outdated
Show resolved
Hide resolved
controlplane/controller/internal/controller/templates/tunnel.tmpl
Outdated
Show resolved
Hide resolved
controlplane/controller/internal/controller/templates/tunnel.tmpl
Outdated
Show resolved
Hide resolved
e4604db to
ba31225
Compare
ba31225 to
e87e173
Compare
Resolves: malbeclabs#2485 1. Add incremental changes: set-overload-bit formatting 2. Add incremental changes: add else branch to set-overload-bit 4. Add incremental changes: add e2e test undrained Resolve conflicts Address Steve's comments
e87e173 to
cdb4717
Compare
Resolves: #2485 ## Summary of Changes * Added logic to `tunnel.tmpl` that shuts down user BGP, IBGP sessions, MSDP neighbors, and ISIS when `device.status` is `Drained` * Added test case and fixture file to verify drained device configuration * This implements the device maintenance workflow from RFC-12 (Network Provisioning Framework) ## Testing Verification * Added unit test `render_drained_device_config_successfully` that verifies shutdown commands are included in rendered config when device status is Drained I went through the classic process of draining and undraining a device by establishing an IBRL tunnel from one of our bm hosts to a dn dzd. This was the verification: ``` (base) ubuntu@chi-dn-bm1:~$ doublezero status Tunnel Status | Last Session Update | Tunnel Name | Tunnel Src | Tunnel Dst | Doublezero IP | User Type | Current Device | Lowest Latency Device | Metro | Network unknown | 1970-01-01 00:00:00 UTC | doublezero0 | 137.174.145.138 | 100.0.0.1 | 137.174.145.138 | IBRL | chi-dn-dzd1 | N/A | Chicago | devnet (base) ubuntu@chi-dn-bm1:~$ ---------------------------------------------------------------------- (base) ubuntu@chi-dn-bm1:~$ doublezero latency Pubkey | Code | IP | Min | Max | Avg | reachable JATksU22Uc6uwJ5bQvEisf3XWFJAtJrdh3n7eSNmrK7C | test123 | 1.2.3.4 | 0.00ms | 0.00ms | 0.00ms | false 4CkvmyquGN4qtXLNj3hpJcqYbb7PCanLbU1rQHHdp6xp | chi-dn-dzd3-delete-me | 0.1.2.3 | 0.00ms | 0.00ms | 0.00ms | false 9rSYq2eyR5sPiu5Ug5bBP8AVNtXf1rD59pbAgrT6Yx5r | chi-dn-dzd3 | 100.0.0.33 | 0.00ms | 0.00ms | 0.00ms | false 3JT6EPj4ESTRevv6MadpLYLvijBVDTQXhuHWuZzFgNfV | dz-test | 1.2.3.7 | 0.00ms | 0.00ms | 0.00ms | false 7g6TT8RU2iBKaWzAxBx87S4aYq5HMztTA1vedQmMpREZ | test789 | 1.2.3.6 | 0.00ms | 0.00ms | 0.00ms | false 7sk4SevuKLWNDLDjCy8m9bMk9MtXPDxmL5TQrchDPeca | chi-dn-dzd4 | 100.0.0.49 | 0.00ms | 0.00ms | 0.00ms | false Cu9n4EreVz2iUieSAyLxbLMtcKCTzggLomn68oUge5ww | test456 | 1.2.3.5 | 0.00ms | 0.00ms | 0.00ms | false 3cSe5iowxN1tzTXKHS9DH8PofiyuHyLg5X3GXD5aF6ri | chi-dn-dzd2 | 100.0.0.17 | 0.00ms | 0.00ms | 0.00ms | false ---------------------------------------------------------------------- chi-dn-dzd1#show running-config | section isis schedule isis-upload interval 360 timeout 2 max-log-files 100 command bash /mnt/flash/upload-wrapper.sh management api models provider isis link-state-database interface Loopback255 isis enable 1 interface Loopback256 isis enable 1 interface Switch1/11/2 isis enable 1 isis circuit-type level-2 isis hello-interval 1 isis metric 1000 isis hello padding isis network point-to-point interface Switch1/11/4 isis enable 1 isis circuit-type level-2 isis hello-interval 1 isis metric 1000 isis hello padding isis network point-to-point interface Switch1/12/3 isis circuit-type level-2 isis metric 1 isis network point-to-point router isis 1 net 49.0000.ac10.0006.0000.00 router-id ipv4 172.16.0.6 log-adjacency-changes set-overload-bit ! address-family ipv4 unicast ! segment-routing mpls no shutdown ``` After I undrain overload-bit is no longer there: ``` router isis 1 net 49.0000.ac10.0006.0000.00 router-id ipv4 172.16.0.6 log-adjacency-changes ! address-family ipv4 unicast ! segment-routing mpls no shutdown ```
Resolves: #2485 ## Summary of Changes * Added logic to `tunnel.tmpl` that shuts down user BGP, IBGP sessions, MSDP neighbors, and ISIS when `device.status` is `Drained` * Added test case and fixture file to verify drained device configuration * This implements the device maintenance workflow from RFC-12 (Network Provisioning Framework) ## Testing Verification * Added unit test `render_drained_device_config_successfully` that verifies shutdown commands are included in rendered config when device status is Drained I went through the classic process of draining and undraining a device by establishing an IBRL tunnel from one of our bm hosts to a dn dzd. This was the verification: ``` (base) ubuntu@chi-dn-bm1:~$ doublezero status Tunnel Status | Last Session Update | Tunnel Name | Tunnel Src | Tunnel Dst | Doublezero IP | User Type | Current Device | Lowest Latency Device | Metro | Network unknown | 1970-01-01 00:00:00 UTC | doublezero0 | 137.174.145.138 | 100.0.0.1 | 137.174.145.138 | IBRL | chi-dn-dzd1 | N/A | Chicago | devnet (base) ubuntu@chi-dn-bm1:~$ ---------------------------------------------------------------------- (base) ubuntu@chi-dn-bm1:~$ doublezero latency Pubkey | Code | IP | Min | Max | Avg | reachable JATksU22Uc6uwJ5bQvEisf3XWFJAtJrdh3n7eSNmrK7C | test123 | 1.2.3.4 | 0.00ms | 0.00ms | 0.00ms | false 4CkvmyquGN4qtXLNj3hpJcqYbb7PCanLbU1rQHHdp6xp | chi-dn-dzd3-delete-me | 0.1.2.3 | 0.00ms | 0.00ms | 0.00ms | false 9rSYq2eyR5sPiu5Ug5bBP8AVNtXf1rD59pbAgrT6Yx5r | chi-dn-dzd3 | 100.0.0.33 | 0.00ms | 0.00ms | 0.00ms | false 3JT6EPj4ESTRevv6MadpLYLvijBVDTQXhuHWuZzFgNfV | dz-test | 1.2.3.7 | 0.00ms | 0.00ms | 0.00ms | false 7g6TT8RU2iBKaWzAxBx87S4aYq5HMztTA1vedQmMpREZ | test789 | 1.2.3.6 | 0.00ms | 0.00ms | 0.00ms | false 7sk4SevuKLWNDLDjCy8m9bMk9MtXPDxmL5TQrchDPeca | chi-dn-dzd4 | 100.0.0.49 | 0.00ms | 0.00ms | 0.00ms | false Cu9n4EreVz2iUieSAyLxbLMtcKCTzggLomn68oUge5ww | test456 | 1.2.3.5 | 0.00ms | 0.00ms | 0.00ms | false 3cSe5iowxN1tzTXKHS9DH8PofiyuHyLg5X3GXD5aF6ri | chi-dn-dzd2 | 100.0.0.17 | 0.00ms | 0.00ms | 0.00ms | false ---------------------------------------------------------------------- chi-dn-dzd1#show running-config | section isis schedule isis-upload interval 360 timeout 2 max-log-files 100 command bash /mnt/flash/upload-wrapper.sh management api models provider isis link-state-database interface Loopback255 isis enable 1 interface Loopback256 isis enable 1 interface Switch1/11/2 isis enable 1 isis circuit-type level-2 isis hello-interval 1 isis metric 1000 isis hello padding isis network point-to-point interface Switch1/11/4 isis enable 1 isis circuit-type level-2 isis hello-interval 1 isis metric 1000 isis hello padding isis network point-to-point interface Switch1/12/3 isis circuit-type level-2 isis metric 1 isis network point-to-point router isis 1 net 49.0000.ac10.0006.0000.00 router-id ipv4 172.16.0.6 log-adjacency-changes set-overload-bit ! address-family ipv4 unicast ! segment-routing mpls no shutdown ``` After I undrain overload-bit is no longer there: ``` router isis 1 net 49.0000.ac10.0006.0000.00 router-id ipv4 172.16.0.6 log-adjacency-changes ! address-family ipv4 unicast ! segment-routing mpls no shutdown ```
Resolves: #2485 ## Summary of Changes * Added logic to `tunnel.tmpl` that shuts down user BGP, IBGP sessions, MSDP neighbors, and ISIS when `device.status` is `Drained` * Added test case and fixture file to verify drained device configuration * This implements the device maintenance workflow from RFC-12 (Network Provisioning Framework) ## Testing Verification * Added unit test `render_drained_device_config_successfully` that verifies shutdown commands are included in rendered config when device status is Drained I went through the classic process of draining and undraining a device by establishing an IBRL tunnel from one of our bm hosts to a dn dzd. This was the verification: ``` (base) ubuntu@chi-dn-bm1:~$ doublezero status Tunnel Status | Last Session Update | Tunnel Name | Tunnel Src | Tunnel Dst | Doublezero IP | User Type | Current Device | Lowest Latency Device | Metro | Network unknown | 1970-01-01 00:00:00 UTC | doublezero0 | 137.174.145.138 | 100.0.0.1 | 137.174.145.138 | IBRL | chi-dn-dzd1 | N/A | Chicago | devnet (base) ubuntu@chi-dn-bm1:~$ ---------------------------------------------------------------------- (base) ubuntu@chi-dn-bm1:~$ doublezero latency Pubkey | Code | IP | Min | Max | Avg | reachable JATksU22Uc6uwJ5bQvEisf3XWFJAtJrdh3n7eSNmrK7C | test123 | 1.2.3.4 | 0.00ms | 0.00ms | 0.00ms | false 4CkvmyquGN4qtXLNj3hpJcqYbb7PCanLbU1rQHHdp6xp | chi-dn-dzd3-delete-me | 0.1.2.3 | 0.00ms | 0.00ms | 0.00ms | false 9rSYq2eyR5sPiu5Ug5bBP8AVNtXf1rD59pbAgrT6Yx5r | chi-dn-dzd3 | 100.0.0.33 | 0.00ms | 0.00ms | 0.00ms | false 3JT6EPj4ESTRevv6MadpLYLvijBVDTQXhuHWuZzFgNfV | dz-test | 1.2.3.7 | 0.00ms | 0.00ms | 0.00ms | false 7g6TT8RU2iBKaWzAxBx87S4aYq5HMztTA1vedQmMpREZ | test789 | 1.2.3.6 | 0.00ms | 0.00ms | 0.00ms | false 7sk4SevuKLWNDLDjCy8m9bMk9MtXPDxmL5TQrchDPeca | chi-dn-dzd4 | 100.0.0.49 | 0.00ms | 0.00ms | 0.00ms | false Cu9n4EreVz2iUieSAyLxbLMtcKCTzggLomn68oUge5ww | test456 | 1.2.3.5 | 0.00ms | 0.00ms | 0.00ms | false 3cSe5iowxN1tzTXKHS9DH8PofiyuHyLg5X3GXD5aF6ri | chi-dn-dzd2 | 100.0.0.17 | 0.00ms | 0.00ms | 0.00ms | false ---------------------------------------------------------------------- chi-dn-dzd1#show running-config | section isis schedule isis-upload interval 360 timeout 2 max-log-files 100 command bash /mnt/flash/upload-wrapper.sh management api models provider isis link-state-database interface Loopback255 isis enable 1 interface Loopback256 isis enable 1 interface Switch1/11/2 isis enable 1 isis circuit-type level-2 isis hello-interval 1 isis metric 1000 isis hello padding isis network point-to-point interface Switch1/11/4 isis enable 1 isis circuit-type level-2 isis hello-interval 1 isis metric 1000 isis hello padding isis network point-to-point interface Switch1/12/3 isis circuit-type level-2 isis metric 1 isis network point-to-point router isis 1 net 49.0000.ac10.0006.0000.00 router-id ipv4 172.16.0.6 log-adjacency-changes set-overload-bit ! address-family ipv4 unicast ! segment-routing mpls no shutdown ``` After I undrain overload-bit is no longer there: ``` router isis 1 net 49.0000.ac10.0006.0000.00 router-id ipv4 172.16.0.6 log-adjacency-changes ! address-family ipv4 unicast ! segment-routing mpls no shutdown ```
Resolves: #2485 ## Summary of Changes * Added logic to `tunnel.tmpl` that shuts down user BGP, IBGP sessions, MSDP neighbors, and ISIS when `device.status` is `Drained` * Added test case and fixture file to verify drained device configuration * This implements the device maintenance workflow from RFC-12 (Network Provisioning Framework) ## Testing Verification * Added unit test `render_drained_device_config_successfully` that verifies shutdown commands are included in rendered config when device status is Drained I went through the classic process of draining and undraining a device by establishing an IBRL tunnel from one of our bm hosts to a dn dzd. This was the verification: ``` (base) ubuntu@chi-dn-bm1:~$ doublezero status Tunnel Status | Last Session Update | Tunnel Name | Tunnel Src | Tunnel Dst | Doublezero IP | User Type | Current Device | Lowest Latency Device | Metro | Network unknown | 1970-01-01 00:00:00 UTC | doublezero0 | 137.174.145.138 | 100.0.0.1 | 137.174.145.138 | IBRL | chi-dn-dzd1 | N/A | Chicago | devnet (base) ubuntu@chi-dn-bm1:~$ ---------------------------------------------------------------------- (base) ubuntu@chi-dn-bm1:~$ doublezero latency Pubkey | Code | IP | Min | Max | Avg | reachable JATksU22Uc6uwJ5bQvEisf3XWFJAtJrdh3n7eSNmrK7C | test123 | 1.2.3.4 | 0.00ms | 0.00ms | 0.00ms | false 4CkvmyquGN4qtXLNj3hpJcqYbb7PCanLbU1rQHHdp6xp | chi-dn-dzd3-delete-me | 0.1.2.3 | 0.00ms | 0.00ms | 0.00ms | false 9rSYq2eyR5sPiu5Ug5bBP8AVNtXf1rD59pbAgrT6Yx5r | chi-dn-dzd3 | 100.0.0.33 | 0.00ms | 0.00ms | 0.00ms | false 3JT6EPj4ESTRevv6MadpLYLvijBVDTQXhuHWuZzFgNfV | dz-test | 1.2.3.7 | 0.00ms | 0.00ms | 0.00ms | false 7g6TT8RU2iBKaWzAxBx87S4aYq5HMztTA1vedQmMpREZ | test789 | 1.2.3.6 | 0.00ms | 0.00ms | 0.00ms | false 7sk4SevuKLWNDLDjCy8m9bMk9MtXPDxmL5TQrchDPeca | chi-dn-dzd4 | 100.0.0.49 | 0.00ms | 0.00ms | 0.00ms | false Cu9n4EreVz2iUieSAyLxbLMtcKCTzggLomn68oUge5ww | test456 | 1.2.3.5 | 0.00ms | 0.00ms | 0.00ms | false 3cSe5iowxN1tzTXKHS9DH8PofiyuHyLg5X3GXD5aF6ri | chi-dn-dzd2 | 100.0.0.17 | 0.00ms | 0.00ms | 0.00ms | false ---------------------------------------------------------------------- chi-dn-dzd1#show running-config | section isis schedule isis-upload interval 360 timeout 2 max-log-files 100 command bash /mnt/flash/upload-wrapper.sh management api models provider isis link-state-database interface Loopback255 isis enable 1 interface Loopback256 isis enable 1 interface Switch1/11/2 isis enable 1 isis circuit-type level-2 isis hello-interval 1 isis metric 1000 isis hello padding isis network point-to-point interface Switch1/11/4 isis enable 1 isis circuit-type level-2 isis hello-interval 1 isis metric 1000 isis hello padding isis network point-to-point interface Switch1/12/3 isis circuit-type level-2 isis metric 1 isis network point-to-point router isis 1 net 49.0000.ac10.0006.0000.00 router-id ipv4 172.16.0.6 log-adjacency-changes set-overload-bit ! address-family ipv4 unicast ! segment-routing mpls no shutdown ``` After I undrain overload-bit is no longer there: ``` router isis 1 net 49.0000.ac10.0006.0000.00 router-id ipv4 172.16.0.6 log-adjacency-changes ! address-family ipv4 unicast ! segment-routing mpls no shutdown ```
Resolves: #2485
Summary of Changes
tunnel.tmplthat shuts down user BGP, IBGP sessions, MSDP neighbors, and ISIS whendevice.statusisDrainedTesting Verification
render_drained_device_config_successfullythat verifies shutdown commands are included in rendered config when device status is DrainedI went through the classic process of draining and undraining a device by establishing an IBRL tunnel from one of our bm hosts to a dn dzd. This was the verification:
After I undrain overload-bit is no longer there: